A Brief Index for Proximity Searching
نویسندگان
چکیده
Many pattern recognition tasks can modeled as proximity searching. From nearest neighbor classification to multimedia databases the common task is to quickly find all the elements close to a given query. This task can be accomplished very easily by sequentially examining all the elements in the collection, but turns to be impractical in two situations: when the distance used to compare elements is expensive or when the numbers of elements is very large (in the order of billions of objects). Recently an improvement over previous approaches has been done by using permutations instead of distances to predict proximity. Every object in the database record how the set of reference objects (the permutants) is seen, i.e. only the relative positions are used. When a query arrives the relative displacements in the permutants between the query and a particular object is measured. The permutation of every object is represented with κ short integers in practice, producing bulky indexes of size κn. In this paper we show how to represent the permutation as a binary vector, using just one bit for each permutant (instead of log κ in the plain representation). The Hamming distance in the binary signature is used then to predict proximity between objects in the database. We tested this approach with many real life metric databases obtaining a recall close to the Spearman ρ using 16 times less space.
منابع مشابه
Compact and Efficient Permutations for Proximity Searching
Proximity searching consists in retrieving the most similar objects to a given query. This kind of searching is a basic tool in many fields of artificial intelligence, because it can be used as a search engine to solve problems like kNN searching. A common technique to solve proximity queries is to use an index. In this paper, we show a variant of the permutation based index, which, in his orig...
متن کاملBrief Communication Adjacency and proximity searching in the Science Citation Index and Google
We have developed simple algorithms that allow adjacency and proximity searching in Google and the Science Citation Index (SCI). The SCI algorithm exploits the fact that SCI stopwords in a search phrase function as a placeholder. Such a phrase serves effectively as a fixed adjacency condition determined by the number n of adjacent stopwords (i.e., retrieve all records where word A and word B ar...
متن کاملUsing the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces
Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as p...
متن کاملBoosting the Permutation Based Index for Proximity Searching
Proximity searching consists in retrieving objects out of a database similar to a given query. Nowadays, when multimedia databases are growing up, this is an elementary task. The permutation based index (PBI) and its variants are excellent techniques to solve proximity searching in high dimensional spaces, however they have been surmountable in low dimensional ones. Another PBI’s drawback is th...
متن کاملAdjacency and Proximity Searching in the Science Citation Index and Google
We have developed simple algorithms that allow adjacency and proximity searching in Google and the Science Citation Index (SCI). The SCI algorithm exploits the fact that SCI stopwords in a search phrase function as a placeholder. Such a phrase serves effectively as a fixed adjacency condition determined by the number n of adjacent stopwords (i.e., retrieve all records where word A and word B ar...
متن کامل